Universal Dependencies for Japanese Based on Long-Unit Words by NINJAL

نویسندگان

چکیده

Universal Dependencies (UD) は言語横断的に単語の依存構造に基づくツリーバンクを構築するプロジェクトである.全言語で統一した基準により,品詞・依存構造アノテーションデータの構築が 100 言語以上の言語について進められている.分かち書きをしない言語においては,基本単位となる構文的な語 (syntactic word) を規定する必要がある.従前の日本語の UD データは,形態論に基づく単位である国語研短単位を採用していた.今回,我々は新たに構文的な語に近い単語単位である国語研長単位に基づく日本語 である UD_Japanese-GSDLUW, UD_Japanese-PUDLUW,UD_Japanese-BCCWJLUW を構築したので報告する.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universal Dependencies for Japanese

We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon Un...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Long-Distance Dependencies and Applicative Universal Grammar

To deal with long-distance dependencies, Applicative Universal Grammar (AUG) proposes a new type of categorial rides, called superposition rules. We compare the AUG rules with the alternative rules of Steedman's Combinatery Categorial Grammar (CCG) (Steedman, 1987, 1988, 1990; Szabolcsi, 1987; Ades and Steedman, 1982). In contrast to Steedtmm's rules, the AUG rules are free from inconsistencies...

متن کامل

'BonTen' - Corpus Concordance System for 'NINJAL Web Japanese Corpus'

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising 25 billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents a corpus concordance system named...

متن کامل

Universal Dependencies for Greek

This paper describes work towards the harmonization of the Greek Dependency Treebank with the Universal Dependencies v2 standard, and the extension of the treebank with enhanced dependencies. Experiments with the latest version of the UD_Greek resource have led to 88.94/87.66 LAS on gold/automatic POS, morphological features and lemmas.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Shizen gengo shori

سال: 2023

ISSN: ['1340-7619', '2185-8314']

DOI: https://doi.org/10.5715/jnlp.30.4